arXiv: 1508.06321v2 [stat.ME] 13 Sep 2015 


When large n is not enough—Distribution-free Interval 
Estimators for Ratios of Quantiles 

Luke A. Prendergast*and Robert G. Staudte”^ 

Department of Mathematics and Statistics, La Trobe University 
Melbourne, Victoria, Australia, 3086 

13 September, 2015 


Abstract 

Ratios of sample percentiles or of quantiles based on a single sample are often pub¬ 
lished for skewed income data to illustrate aspects of income inequality, but distribution- 
free confidence intervals for such ratios are not available in the literature. Here we derive 
and compare two large-sample methods for obtaining such intervals. They both require 
good distribution-free estimates of the quantile density at the quantiles of interest, and 
such estimates have recently become available. Simulation studies for various sample 
sizes are carried out for Pareto, lognormal and exponential distributions, as well as htted 
generalized lambda distributions, to determine the coverage probabilities and widths of 
the intervals. Robustness of the estimators to contamination or a positive proportion of 
zero incomes is examined via influence functions and simulations. The motivating ex¬ 
ample is Australian household income data where ratios of quantiles measure inequality, 
but of course these results apply equally to data from other countries. 
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1 Introduction 


Ratios of percentiles from a single popnlation may be of direct interest for many disciplines, 
bnt in particular it is very often used as a simple measure of income inequality. For example, 
in a recent brief discussion of income inequality measures, De Maio (2007) remarks that decile 
ratios are simple but effective. Also, in the inequality literature one often hnds estimated 
ratios of quantiles plotted against the year in which the samples are taken, to illustrate 
whether inequality is growing or decreasing over time. Of course such plots can be misleading, 
and what is required are inferential methods based on sample ratios. 

Recent contributions to inference for ratios of quantiles based on two independent samples 
are found in Bonett & Price (2002), Cheng & Wu (2010). However, to our knowledge there 
are no published results based on inference for ratios of quantiles based on a single sample. 

The main results presented in this paper are hrst, showing that large sample distribution- 
free conhdence intervals for ratios of quantiles based on standard theory have reliable coverage 
for moderate sample sizes. Second, even for samples of size 10,000, the standard errors of 
ratio estimators cannot be ignored; thus one cannot assume that sample ratios are accurate 
just because the sample size is ‘large’. Third, showing such procedures are resistant to outliers 
and to the presence of a small proportion of zero incomes in the population. The same cannot 
be said for most inequality measures, as shown by Cowell & Victoria-Feser (1996); although 
progress in robustifying some of them has been achieved, see Cowell & Victoria-Feser (2003) 
and references therein. 

In Section 2 we examine income data from the Australian Bureau of Statistics, and illus¬ 
trate how our results can provide useful inferential information. Then in Section 3 we hnd 
distribution free-standard errors for the ratio of quantiles which require distribution-free esti¬ 
mates of the quantile density at the two quantiles dehning the ratio. Two interval estimators 
are described, one based on the studentized log-transformation and the other on a variance 
stabilization transformation. Simulation studies in Section 4 show that these intervals rarely 
have coverage below the nominal level for several distributions that are commonly assumed 
for income populations, and that the intervals based on variance stabilization have more ac¬ 
curate coverage and smaller widths for small to moderate sample sizes. Similar good results 
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are obtained for data fitted by the generalized lambda distribution; these are relegated to 
the Appendix, Section 7.4. In Section 5 the effects of contamination by a point mass of zero 
incomes or inhnitesimal contamination are studied via simulations and influence functions. 
In Section 7.5 interval estimators for the difference between two independent ratios are found 
effective. The software R script for computing the intervals is found in Section 7.6, and 
further research is suggested in Section 6. 

2 Australian Bureau of Statistics income data 

Measuring household income is a complicated task carried out by governmental departments, 
including the Australian Bureau of Statistics, whose annual reports are available at ABS 
(2011). The gross household income per week is of interest but households differ so much in 
size that the equivalized disposal income (EWI) is also found, and which the ABS dehnes as 
‘... the amount of disposable cash income that a single person household would require to 
maintain the same standard of living as the household in question, regardless of the size or 
composition of the latter.’ 

In Table 1 we list ratios of percentiles from ABS (2011). While details of how the per¬ 
centiles were calculated are not reported, the sample size of households each year ranges from 
9,345 for 2007 and 18,071 for 2009. 

In Figure 1 are histograms of the EWI data for the hnancial years beginning 2005 and 
2011, listed in Table 7 of the Appendix, after exclusion of 0 income and income greater than 
$2000. Below them are density plots of the ‘reconstructed’ data sets from which we will 
obtain our quantile estimates and standard errors. Superimposed on the density plots are 
gamma densities whose parameters are estimated by the method of moments. We are not 
advocating these gamma models for inference regarding quantiles, but rather we generate 
random samples from them to assist in assessing interval coverage for such data. 

In Table 2 we report the results for our reconstructed data. VST and ‘Stud’ refer to 95% 
interval estimators that we introduce soon in Section 3.2 based on variance stabilization and 
studentization respectively. As can be seen the studentized and VST intervals are identical to 
two decimal places which is due to the large sample sizes. It should be noted that the widths 
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of the intervals are not so narrow so as to make the intervals redundant, but rather they are 
themselves informative when reported with the ratio point estimates. Also shown are results 
for 10,000 simulation runs from gamma distributed data with parameters set to those used 
to overlay the densities in Figure 1. They indicate excellent coverage of the intervals when 
sampling from the htted gamma distributions with approximately the same results for both 
methods. 

Let Za denote the 100a% quantile from the standard normal distribution. A test of 
P2005 = P2011 for the ratio P90/P10 against a signihcant difference would reject at level 0.05 
when S = \ In(p2005)-ln(p20ii)| > zo.qtsxSE, where SE = SE[ln(p 2 oo 5 )-ln(p 2 oii)] = {SE 2005 + 
SE 2 oii}^'^^, and SE 2005 = SE[ln(/) 2 oo 5 )] = 0.0105 and SE 2022 = SE[ln(p 2 oii)] = 0.0088. Now 
|S'/SE| = 2.32 > 1.96, so the P90/P10 ratios differ signihcantly for the years 2005 and 2011. 
Formalities are given in Section 7.5. 

3 Distribution-free confidence intervals 

3.1 Distribution-free standard errors for ratios of quantiles 

Let F be a continuous distribution with positive domain. Dehne the quantile function as the 
inverse G{p) = F~^{p) = inf{x : F{x) > p}, 0 < p < 1. When F is understood, we often 
write Xp for the gth quantile G{p), which is also called the lOOpth percentile. For any choices 
of p 7 ^ g in ( 0 , 1 ) we are interested in the ratios 

p = p(p,g) = ^- ( 1 ) 

Xq 

One can estimate the p th quantile Xp = G{p) by X{^np]+i ), the [np] + 1st order statistic of a 
sample of size n from F. However, the Hyndman & Fan (1996) quantile estimator Xp, which 
is a linear combination of two adjacent order statistics, generally has much less bias and 
similar variance, so in the sequel we estimate Xp by Xp. This estimator is Type 8 of quantile 
estimators on the software package R, Development Core Team (2008). Given such a single 
sample, and hxed 0 < p, g < 1 we estimate the ratio p = p(p, g) = Xp/xq by p = Xp/xq. 

Assuming F has a positive and continuous derivative f = F' on its support, the derivative 
of the quantile function G = F~^ is given by G'{p) = g{p) = l//(xp); this is the quantile 
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density of Parzen (1979), earlier called the sparsity index by Tukey (1965). It arises in first 
order asymptotic covariance expressions, see (David, 1981, Ch.2) or (DasGupta, 2006, Cli.7), 
where it is shown that as n increases without bound E(xp) = Xp] and, for 0 < p < g < 1 

nVarixp) = pil - p)g^{p) = 

nCov{xp,Xq) = p{l - q)g{p)g{q) = (Tp^q , (2) 

where = means that lower order terms are omitted. For the case 0<g<p<l,n Cov(a;p, Xq) = 
q{l - p)g{p)g{q) = (Tp^q. 

It follows that for 0<p<g<la hrst-order approximation to the correlation between 
Xp, Xq is Corr(xp, Xq) = Op^q/ (o-pO-g) = \/p(l - g)/{g(l -p)} > 0; for 0 < g < p < 1 it is 
yjq{l — p)/{p(l — g)} > 0. This asymptotic correlation is notably free of F and sample sizes, 
and must be taken into account in computing standard errors of p = Xp/xq. The classical 
formula (Johnson et ai, 1993, p.50) for the variance of a ratio of random variables, provided 
the denominator has positive support, is given in terms of means, variances and covariance of 
its components. We only consider F with positive support, and thus when applied to sample 
quantile estimators with 0 < p, g < 1 this formula for the ratio of variances can be written: 

nVar(pp,g) = nVar(Tp - pxq)/x‘^q = ao + aiP + ^ 2 ^^ = h^(p) , (3) 

where h^(p) is the quadratic with constants dehned in terms of ( 2 ) by oq = CTp/x^, oi = 
—2ap^q/x^ and 02 = crg/x"^. Note that Oq, Oi and 02 are free of scale and sample size. The 
quadratic h^(p) > 0 for all p because Oq > 0 and its discriminant a\ — 4 aoa 2 < 0 ; the latter 
inequality follows from Corr(a;p, Tg) < 1. 

These results suggest that the large sample variance of p = Pp^g is approximately h‘^{j>)/n, 
and, because the large sample squared bias is of smaller order, the standard error can be 
approximated by SE(p) = h{j>)/y/n. This formula has been derived for known constants oq, 
ai and 02 . To make it distribution-free, one needs to replace Xq by Xq, g{p) by g{p) and g(g) 
by g(g); where g is a quantile density estimate such as the kernel density estimator described 
in Appendix 7.3. When this is done, we obtain the distribution-free standard error estimate 
SE(p) = h(p)/0i. 
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3.2 Two interval estimators for ratios of quantiles 

We compare distribution-free large-sample confidence intervals for p = Xp/xq, where 0 < 
p, g < 1. The distribution of — p) is asymptotically normal but quite skewed for 

moderate sample sizes, so transformations are employed to normalize its distribution and 
derive conhdence intervals. This methodology is standard, so here we only present the hnal 
results for the log-transformation and a variance stabilizing transformation, with details given 
in Section 7.2. 


Studentized log-transformed ratios. 


One traditional approach for hnding conhdence intervals of a ratio of statistics snch as p = Pp^q 
is to hrst hnd approximations to the asymptotic mean and variance of the estimated log-ratio 
6 = ln(p), use the studentized version of this estimator to obtain a conhdence interval for 6 
and then exponentiate this interval. In terms of earlier dehnitions (2), the asymptotic mean 
and variance oi 9 = ln(p) are shown in Appendix 7.2 to be: 


kit- 


/ 2 2 
Var(0) = + ^ 


xt 


2a. 


p,q 


1 h\p) 


(4) 

(5) 


n \Xp Xg XpXqJ n p^ 

where h‘^{p) is given by (3). The asymptotic normality of 6 then leads to the nominal 
100(1 — a)% conhdence interval for p: 


[L,U]s = p < exp 


T^i-a /2 Y Var(6') 


( 6 ) 


To make the intervals (6) distribntion-free, the valnes of Xp, Xq, ap, aq and ap^q appearing in 
Var(6') need to be consistently estimated, and Var(6*) replaced by Var(6'). It is also noted in 
Appendix 7.3 that the widths of these intervals behave like: 

Ws = 2pzi_„/2 ^Var(0) = . (7) 


n 


Variance stabilized ratios of quantiles. 

Let Z(p) = ai -|- 2a2p be the derivative of the quadratic h?{p) = oq + aip -|- a 2 p^ dehned 
in (3), and let = 4aoa2 ~ be the negative of its discriminant. Then as explained in 
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Appendix 7.3, one can derive large-sample nominal 100(1 — a)% confidence intervals: 


[L,C/]v 


{ Hsinh 
2(32 


sinh 



T ^1—a/2 




( 8 ) 


The hats appearing on 02 , I and D indicate that they are the result of distribution-free 
estimates of Xp, ap, ap^g etc. being replaced by consistent estimates. 

The asymptotic widths of these intervals are, up to first order, the same as those derived 
by the log-transformation (7). Thus the large-sample coverage and widths of the two intervals 
[L, 7/]s and [L, U]y are the same; so in Section 4 we compare their finite sample properties. 


4 Simulation Studies 

In this section we report simulated coverage probabilities and mean interval widths for several 
distributions. Extensive simulations were carried out for the LN(0,1), EXP(l), Xi: xi; xh 
Pareto(l), Pareto(1.5) and Pareto(2) distributions. By ‘Pareto(a)’ we mean the Type II 
Pareto distribution with shape parameter a and distribution function Fa{x) = 1 — (1 -|- 
for a,x > 0. Commands for generating data or finding quantiles from this distribution are 
obtained by downloading the package actuar on R. We report the results for three of these 
distributions and remark that similar results were obtained for the other distributions. 


4.1 Moderate sample sizes 

To ensure that the interval widths are considered in the correct context, in Table 3 we provide 
the true quantile ratios for the distributions considered. In Table 4 we report the simulated 
coverage probabilities (cp) and average widths (w) for the interval estimators associated 
with the LN(0,1), Xs ^md the Pareto(2) distributions for various choices or p and q and 
three sample sizes n = 100,250 and 500. In almost all cases, the coverage probabilities 
are between 0.95 and 0.97. In general, the VST intervals are slightly narrower than the 
studentized intervals and consequently slightly less conservative. This is consistent with the 
folkloric view amongst applied statisticians that variance stabilization generally leads to more 
powerful tests than studentization for moderate sample sizes; a view recently reinforced by 
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examples in Kulinskaya et al. (2010), Staudte (2014) and theory in Morgenthaler &: Staudte 

( 2012 ). 

The results of Table 4 were restricted to the special case of g = 1 — p for choices of 
p = 0.05,0.1, 0.2, 0.8, 0.9, 0.95. However, it will be useful to consider the coverage probabilities 
for a much wider choice of p and q. We will now consider coverage probabilities for the VST 
and studentized intervals for two of the distributions. Additionally, we use the log-normal 
QOR only since the smaller computation cost means that we can use a large number of 
iterations over many choices of p and q. 

As in Prendergast & Staudte (2014), we use contour plots to assess coverage probability 
over a wider range of p, q combinations. In Figure 2 we plot the simulated coverage probabil¬ 
ities based on 10,000 replications for all combinations of p and q from 0.05, 0.06,..., 0.95 for 
data sampled from the LOGN(0, 1) distribution. Green indicates ideal coverage of between 
0.95 and 0.96 (e.g. at least nominal) and light blue indicates slightly conservative intervals. 
When n = 100 we can see that the intervals can be very conservative (i.e. the dark blue 
regions) when p and q are close together. However, such choices of p and q do not typically 
provide much insight since quantiles are approximately the same. For other choices of p and 
q the coverages are quite good, despite the small sample sizes of n = 100. Typically, the 
VST interval is the marginally better performer with coverages slightly closer to the nominal 
level of 0.95. As n is increased to 250 and then to 500, we see that the coverage proba¬ 
bilities become even closer to nominal with a tendency for slightly conservative intervals. 
Very rarely does the simulated coverage fall below the nominal coverage of 0.95 highlighting 
reliable performance for this distribution. 

In Figure 3 are shown the simulated coverage probability contour plots for the Pareto(2) 
distribution. In general, the intervals are slightly more conservative than they were for the 
log normal although lower than nominal coverage is very rare. Again, p ^ q results in the 
most conservative intervals, especially for smaller n, although in practice this scenario is 
trivial, at best. Goverage improves for increasing n with most reported coverages between 
0.95 and 0.97 when n = 500. In the next section we will see that further increases of the 
samples sizes continues to improve coverage. 
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4.2 Large sample sizes 

In Table 5 we report large sample size empirical probabilities and mean widths for the same 
intervals and distribntions snmmarized in Table 4, and one can see that the coverage proba¬ 
bilities are closer to nominal. Further, the widths of the intervals are not so small as to justify 
the use of point estimates only. This is especially true for the Pareto(2) distribution where 
even for n = 10, 000 the mean widths are still large relative to the ratio being estimated; 
(e.g., p = 0.9, g = 0.1 with p = 39.97 and the mean interval width is 5.89 for both intervals). 

5 Effects of contamination 

5.1 Mixture distribution with spike at zero 

Additional to the large sample simulations conducted above, we note that many samples of 
income data include a small percentage of zero values (e.g. for households with zero income 
or households in debt rounded upwards to zero). We therefore examine the following mixture 
model; 


Ty = (1 - e)F + eAo (9) 

where F is the positive income distribution , Aq is places all its mass at the point 0 and 
0 < e < 1 is the proportion of the mixture that are zeroes. 

In Table 6 we report simulation results for zeroes mixed with the LOGN(0,1), xi and the 
Pareto(2) distributions, respectively, with probabilities (e, 1 — e). For simplicity we report 
only the coverage probabilities and only for the VST intervals; similar results are obtained for 
the studentized intervals. When e = 0.05 we do not report results for any ratio estimating 
xo.o 5 since approximately half of the estimates will equate to zero. Overall the coverage 
probabilities are close to nominal with a tendency for conservative intervals when estimating 
a:o,o 5 - In this case a mass of zeroes lying close to one of the quantiles in the ratio will have a 
small effect on the estimated density in that vicinity. 


Table 5 
here. 


Table 6 
here. 
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5.2 Robustness properties 

For background material on robustness concepts such as influence functions and breakdown 
points, see Hampel et al. (1986) or Staudte & Sheather (1990). 


Influence functions. 


Extending on the zero mixture distribution from (9), define the ‘contamination’ distribution 
which places positive probability eon z (the contamination point) and 1—e on the distribution 
F. Formally, it is defined for each x by Fe^\x) = {l — e)F{x) +el[x > z]. The influence func¬ 
tion for any functional T{F) is then defined for each as the IF{z-,T,F) = lim^^o §iT{Fe^^) 
(see Hampel, 1974). The influence function of the pth quantile Xp = G{F]p) = F~^{p) is 
well-known (Staudte & Sheather, 1990, p.59) to be 


IF(^; G{-),p),F) = {p- I[xp > z]}g{p) , 


( 10 ) 


where G'{p) = g{p) = 1/f\xp) is the quantile density of G at p. One can show that 
E^[IF(Z; G(-),p),E),F)] = 0andVar^[IF(Z; G{-) ,p),F),F)] = Ef[1¥\Z- G( • ,p), F), F)] 
p{1 — p) g‘^{p)- The reason for calculating this variance is that it arises in the asymptotic 
variance of the functional applied to the empirical distribution F„, namely G(F„,p); that 
is, n Var[G(F„,p)] = p{l — p) g‘^{p); and sometimes a simple expression for the asymptotic 
variance is not otherwise available. 

The influence function of the ratio of two quantiles Pp^g{F) = Xp/xg = G{ •) ,p)/G{-), q) 
is then by elementary calculus and (10) found to be 

IF(z;G(-),p),F) XpW{z-G{-),q),F) 


IF(^;Pp,g,F) = 


Xa 


xt 


( 11 ) 


Xg{p - I[xp > z]}g{p) - Xp{q - I[xg > z]} g{q) 

rr2 

■^q 

When expanded in a power series expansion with respect to e, we have that Pp,q{Fe ) = 
Pp^g(F) -|- eIF( 2 :; Pp^g, F) -|- O(e^). Consequently, it would be of interest to study the influence 
relative to Pp^g(F) since large values of IF(z; pp^g,F) are not suggestive of high sensitivity if 
the ratio at F is very large. 

To assess influence sensitivity relative to the size of the ratio at F, in Plot A of Figure 4 
is shown IF( 2 ;; F)/pp ,j(F) for 2 ; G [0,1] and p G (0.05,0.95) and q = 1 — p. As one can 
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see, the influence increases quickly as p approaches its boundaries when z is close to zero. 
In this situation either Xp or Xq is close to zero and therefore close to the contamination. In 
Plot B we vary both p and q but fix the contamination z = 0. Again it can be seen that 
the ratio estimator is especially sensitive to zero valued observations when either p or g is 
close to 0. In practice, if a data set contains a mixture of zero valued observations together 
with positive values then inference will be difficult if either p or g is small. In Section 5.1 
simulations revealed that even a small proportion of zeroes could result in over conservative 
intervals when p or g was equal to 0.05. 

The influence function can also be used to calculate the asymptotic variance 

n Var|pp,,(F„) - p„(F)] = ASV(pp,,; F) = E [IF(z; F)=] 

by expanding (11) and noting that for the two cases p < g and p > g we have I{xp > 
^ ^]) = — A) — A)^i.^q ^ ^]) = ^{^q ^ A) respectively. This gives 

ASV(pp,g; F) = \ [p(l - p)xlg‘^{p) + g(l - q)xlg^{q) - 2xqXpm{p, q)g{p)g{q)] (12) 

Xq 

where m{p, q) = p(l — g) when p < q and g(l — p) when p > q. It can be verified that this 
expression for the asymptotic variance is equal to (3). Also, for the special case p = g we 
have simply 

ASV(pp,i_p; F) = [xi-pg{p) - Xpg{l - p)A (13) 

Xl-p 

We assess the variability of the ratio estimator with respect to the magnitude of the ratio to 
be estimated. Therefore, in Figure 5 we plot ASV(pp 4 _p; F)/pp^^_p(F) for p in (0.05,0.95) 
(Plot B). These plots show that the variance of the ratio estimator can be very large (relative 
the population ratio squared) when either p and g is close to zero. In practice, one needs 
to be aware that ratios involving very small quantiles will have higher variability and wider 
intervals relative to the magnitude of the ratio will result. 

Breakdown points. 

The asymptotic breakdown point e* = e*{T, F) of a functional T{F) is roughly speaking the 
minimum proportion of contamination of F to Fe^'^ that can render useless T(Fi^^), as 2 ; varies 
over the support of F. This e*(T, F) is often free of F and gives an indication of how sensitive 
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the functional T{F) and its estimator T{Fn) are to possible contamination. Unfortunately, 
rigorous definitions and mathematical arguments for hnding such breakdown points are often 
complicated, see Genton (2003) and references therein. Here we give a somewhat heuristic 
derivation of the breakdown point for the functional Tp^q{F) = pp^q = Tp{F)/Tq{F), where 
Tp{F) = F~^{p) and F is continuous and strictly increasing on (0, cxd). 

It is well known and intuitively clear that the breakdown point of Tp{F) itself is F{Tp, F) = 
min{p, 1 — p}. This is because if e > p one can move the pth quantile of Tp{Fe '^) to 0 by 
choice of z and if e > 1—p one can make it move towards +oo. And for any e < min{p, 1 — p} 
the contamination cannot move Tp(Fe ) to one of its boundaries. 

The functional Tp^q{F) = pp^q is more complicated, and ‘breaks down’ if either Tp or Tq 
breaks down, (because then the ratio is 0,+oo or undehned), and hence uninformative. It 
also breaks down if Tp{Fe^^) = Tq{Fe^^) (because then the ratio is 1, another uninformative 
value); and this can be arranged if and only if e > \p — q\ by taking 2 ; = Xp. Putting these 
facts together, the breakdown point for the ratio of quantiles equals e*{Tp^q) = min{{p, 1 — 
p, g, 1 — g, |p —g|} > 0. This breakdown point is clearly maximized by taking p=l/3,g = 2/3 
or p = 2/3, g = 1/3. 


6 Discussion and further research 

While point estimators of the ratio of percentiles from a single sample are commonplace, 
accompanying standard errors and/or interval estimators of such ratios are now possible. We 
have shown that such procedures are necessary because what are usually considered large 
samples do not by any means guarantee that variability is negligible in the ratio estimates. 

We compared two interval estimators of the quantile ratios, one based on the studentized 
log-transformation, and the other on variance stabilization. While asymptotically equivalent, 
simulations showed that the coverage of the VST intervals was slightly better than the log 
intervals, although both are somewhat conservative for moderate sample sizes. However, the 
log-transformed ratios are more amenable to computing two-sample tests from independent 
samples, as described in 7.5. 

One may be able to reduce the conservative coverage of both intervals by using a bias 
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correction; for example by subtracting an estimate of the bias in log(p), see Equation (4). 
However, we tried this and other bias correction methods for the variance-stabilized estimate 
ratio, to no avail. Finally, it may well be possible to choose sample sizes to achieve a desired 
relative width in the conhdence intervals over a large class of distributions. 

The good robustness properties of simple ratios of quantiles are desirable in all inequality 
measures; and, no doubt replacing moments by appropriate quantiles in more sophisticated 
inequality measures is possible and another area of further research. 
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7 Appendix 
7.1 ABS Data 

The number of persons (in thousands) for each income category is listed: for example, there 
were 73,700 persons with no income in the hnancial year beginning 1 July, 2005. The total 
number of persons in this year is estimated at 19,930,700. Of course not all households were 
sampled and converted to equivalized disposal income per person. On page 25 of the same 
ABS document one hnds that the sample size of households was 9,961 for 2005 and 14,569 
for 2011. Thus the hgures in Table 7 are only estimates based on what was found in the 
samples, and then converted to population estimates. 

The original sample equivalized data are not readily available, so we ‘reconstructed’ the 
sample by generating random numbers within each income range in proportion to those 
in Table 7. The ABS informs us that different weights for each income group were used to 


Table 7 
here. 
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generate the table, and these are conhdential for privacy reasons, so our reconstructed sample 
will differ from theirs; nevertheless we think the differences are negligible for our purposes. 
Also, we will truncate the income data to the interval ( 0 , 2000 ] for two reasons: hrst, to 
obtain a sample from a continuous data set by excluding the positive mass at 0 ; and second 
because the largest category ‘2000 or more’ is unbounded. Our reconstructed sample for 2005 
has size 9961(1 - (73.7 + 506.2)/19930.7) = 9671 and similarly for 2011 it is 13904. 


7.2 Derivation of (4)-(7) 


In what follows, we use the general approximations derived from Taylor expansions E[ln(7/)] = 
ln(E[t/]) — Var[t/]/{2E^[t/]} and Var[ln(t/)] = Var[17]/{E^[[/]} and similarly for V. Further, 
we need 


E[ln(f/) ln(E)] = ln(E[f/]) ln(E[l/]) + 


Cov[U,V] ln(E[l/]) Var[[/] ln(E[t/]) Var[l/] 


E[U]E[V] 2E^[U] 

Combining the above formulae, the approximate variance of \n{U/V) is 


2E^[V] 


Var 




Var[[/] Var[l/] 2Cov[U, E] 
E‘^[U] ^ E^[V] ~ E[U]E[V] 


Applying these approximations to U = Xp and V = Xq yields (4) and (5). By the Delta 
Theorem, (DasGupta, 2006, p.40), {6 — 0)/\jYax[9] converges in distribution to a standard 
normal distribution, so a large sample 100(1 — a)% conhdence interval for 6 is given by 
6 =F ^i-o /2 ■^Var[ 6 *] . This leads immediately to the interval ( 6 ) having the same conhdence 
for pp^q. By expanding the exponentials appearing in ( 6 ) in series, it is found that the widths 
of these intervals hTs = Ug — can be expressed in terms of p and Var[0] as shown in (7). 


7.3 Quantile density estimation 

The conhdence intervals described previously ( 6 ) and ( 8 ) require estimates of oq, oi and 02 
appearing in the asymptotic variance quadratic (3), which require estimates of aq and 
(Tp^q dehned in ( 2 ); and these in turn require estimates of the quantile densities g{p) and g{q)- 
There have been many contributors to this problem and we refer the reader to Prendergast 
& Staudte (2015) for background and results on kernel density estimators of the form 'g{p) = 
^ 1^=1 ^{i) {kb{p — ~ kbip — where 6 is a bandwidth and kb{-) = k{- — b)/b for some 
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kernel function k which is an even function on [—1,1], We follow Prendergast & Staudte 
(2015) in using the Epanechnikov kernel with an estimated optimal bandwidth. The optimal 
bandwidth depends on the quantile optimality ratio QOR(m) = g/g''{u) and the QOR for 
an assumed underlying log-normal distribution can be used for many unimodal distributions 
supported on the half-inhnite interval [0, cxd); a boundary correction is included for quantiles 
near 0. Alternatively, one can calculate the QOR assuming that the underlying density 
can be well-approximated by the highly flexible generalized lambda distribution (OLD), see 
Section 7.4. 

The intervals (8) are derived exactly as for the quantile-based skewness coefficients in 
(Staudte, 2014, Sec. 3.3) and displayed in Equation 9 of that paper. One only needs to 
replace the coefficients in the quadratic dehning the asymptotic variance by the simpler ones 
needed here (3). It is also shown there that the width W\ = Uy — Ly can be expressed 
Wy = 2 g{p) Zi-aiil\/n -|- Op(n“^/^). The leading term of this expression is exactly equal 
to that in (7), which is the asymptotic width for the interval hPs based on studentization. 

7.4 GLD methods and results 

OLD QOR identihes another approach when the underlying distribution is assumed to be at 
least close to a member of the highly-flexible generalized lambda distribution. For more on 
the estimation of the quantile density see Appendix 7.3 and Prendergast & Staudte (2015). 

In general, the VST intervals are slightly narrower than the studentized intervals and con¬ 
sequently slightly less conservative. Additionally, there may be some small gain to using 
the GLD QOR, in particular when the distribution is not the log-normal. However, the log¬ 
normal QOR provides a good bandwidth and is easier to compute. Given that there were 
10,000 iterations used in the simulations, we used method of moments estimators for the 
GLD distribution which were comparatively quick to compute. Table 8 

There are various other GLD estimators available (for a recent discussion see Gorin & here. 
Meterelliyoz, 2015). The R packages gld (King et al, 2014) and GLDEX (Su et al, 2007) 
provide various GLD estimators. However, some small improvements may results when 
using the GLD QOR as seen in Table 8. Using the parameterisation of (Freimer et ai, 1988, 

FKML parameterisation), some small improvements may be achieved although requiring the 
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estimation of four parameters increases the computational complexity. 


7.5 Intervals comparing two independent ratios 


The theory supporting the studentized log-transformed ratios can also be extended to consider 
the difference between two independent log-transformed ratio estimators. For simplicity we 
will assume that the same p and q are used for each of the estimators although this is 
technically not required. Let px = Px{p,(l) and 'py = Py{p,q) be estimates of the percentile 
ratios px and py respectively. Further, let Ox = ln(pa:) and 9y = ln(pj/) where the asymptotic 
variances, Var(6'y) and Var(6'j^), for each can be obtained from (5). Then a large sample 
100(1 — a)% conhdence interval for ln(pa,) — In(py) is 

(Ox - Oy) ± zi-a/2\J^ai(0x) + Var(0y) (14) 


or, for Px/Py to be interpreted on a ratio scale. 


Px 

Py 


exp 


=F^i-o/ 2 V + Var(^„) 


(15) 


The good empirical coverage probabilities for the interval estimates of a single ratio sug¬ 
gest good approximations for the standard error which in turn suggest good coverage is 
achievable when considering two independent ratios. We provide some brief verihcation here 
via simulation and note that these coverage probability results are for both of the interval 
estimators in (14) and (15) which are equivalent in this regard. 

Empirical coverage probabilities computed over 10,000 simulation runs are reported in 
Table 9. The samples sizes were n and m for each of the two groups with data sampled 
from the LN(0,1) and LN(0.2,1.5) distributions respectively. While slightly conservative, for 
each of the differences in percentile ratios considered for this simulation, the coverage does 
not drop below the nominal level of 0.95. Additionally, improved coverage is observed for 
increasing sample sizes. 


Table 9 
here. 
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7.6 R script for computing confidence intervals 

############# R script by Luke A. Prendergast, 28 August, 2015 

Epanechnikov <- functioii(u) { 

3*(1 - u*2)*(abs(u) <= l)/4} 

QuantileDensity <- function(x, p, correct = TRUE){ 

# This function computes the quantile density associated with 

# the p-th quantile. The Epanechnikov kernal density estimator 

# is used with an optimal bandwidth selected based on the QOR 

# for the LNORM distribution. 

# 

# Args: 

# x: A numeric vector. 

# p: A numeric value between 0 and 1. 

# correct: If correct = TRUE then a boundary correction will 

# be carried out if p is less than the bandwidth. 

# Compute the QOR for the LMDRM distribution. 
qPhiu <- l/dnorm(qnorm(p)) 

qPhipru <- qnorm(p)*qPhiu~2 

qPhiprpru <- (qPhiu''3)*(1 + 2*qnorm(p) *2) 

QLNu <- qlnorm(p) 
qLNu <- QLNu*qPhiu 

qLNpru <- qLMu*qPhiu + QLNu*qPhipru 

qLNprpru <- qLNpru*qPhiu + 2*qLNu*qPhipru + QLNu*qPhiprpru 
qratio <- qLNu/qLNprpru 

n <- length(x) 

bw <- (15*(1/5))*(qratio)~(2/5)/(n"(1/5)) 
if (correct) bw <- min(p, bw) 

xsort <- sort(x) 

consts <- (Epanechnikov((p - (l:n - l)/n)/bw) 

- Epanechnikov((p - (1:n)/n)/bw))/bw 
return(sum(xsort*consts)) 

} 


19 



ratioCI <- function(x, p, q, conf.level = 0.95, correct = TRUE) 

# This function computes the studentised and VST confidence 

# intervals for the ratio of the p-th to q-th quantiles. 

# 

# Args: 

# x: A numeric vector. 

# p: A numeric value between 0 and 1. 

# q: A numeric value between 0 and 1. 

# conf.level: A numeric value between 0 and 1 specifying 

# the coverage probability for the intervals. 

# correct: Choice to carry out boundary correction passed 

# to QuantileDensity. 

zcrit <- qnormd - (1 - conf. level)/2) 
n <- length(x) 

Ghat <- quantile(x, c(p, q), type = 8, names = FALSE) 
xphat <- Ghat[l] 
xqhat <- Ghat [2] 
rhopqhat <- xphat/xqhat 

gphat <- QuantileDensity(x, p, correct = TRUE) 
gqhat <- QuantileDensityCx, q, correct = TRUE) 

mpq <- min(p, q) 

Mpq <- max(p, q) 

# The VST interval 

aOhat <- (p*(l - p)*gphat~2)/xqhat''2 

alhat <- -2*mpq*(l - Mpq) *gphat*gqhat/xqhat''2 

a2hat <- (q*(l - q) *gqhat''2)/xqhat''2 

hsqhat <- aOhat + alhat*rhopqhat + a2hat*rhopqhat''2 
lhat <- alhat + 2*a2hat*rhopqhat 
asymSErhopqhat <- sqrt(hsqhat/n) 
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Dhat <- sqrt(4*aOhat*a2hat - alhat''2) 
chat <- zcrit*sqrt(a2hat/n) 

Cl.vst <- (Dhat*sinh(asinh(lhat/Dhat) + c(-l, l)*chat) - alhat)/(2*a2hat) 

# The studentized interval 

nvarthetahat <- p*(l - p)*gphat*2/xphat~2 + 

q*(l - q)*gqhat~2/xqhat~2 - 2*mpq*(l - Mpq)*gphat*gqhat/(xphat*xqhat) 
sigma_n <- sqrt(nvarthetahat/n) 

Cl.stud <- rhopqhat*exp(c(-l, 1)*zcrit*sigma_n) 

CIs <- rbind(CI.vst, Cl.stud) 
rownames(CIs) <- c("VST", "Stud") 

returndist(rho.hat = rhopqhat, CIs = CIs)) 

} 

############################################################################## 

# An example for LNORM generated data 

p <- 0.9 
q <- 0.1 

true.rho <- qlnorm(p)/qlnorm(q) 
true.rho 

X <- rlnorm(lOOO) 
ratioCKx, 0.9, 0.1) 
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Table 1: Ratios of EWI percentiles reported on page 25 of ABS (2011) over selected years 
from 2003 to 2011. PX/PY denote the ratio of the X-th percentile to the Y-th percentile. 


Ratio 

2003 

2005 

2007 

2009 

2011 

P90/P10 

3.87 

4.05 

4.35 

4.24 

4.10 

P80/P20 

2.55 

2.58 

2.60 

2.70 

2.61 

P80/P50 

1.53 

1.55 

1.58 

1.60 

1.56 

P20/P50 

0.60 

0.60 

0.59 

0.59 

0.60 


Table 2: Estimated ratios p and distribution-free (DF) studentized-log and VST intervals 
(Stud Cl and VST Cl; see Section 3.2 for these interval estimators) for the data depicted 
in Figure 1. Also, empirical coverage probabilities cp; mean widths: w based on 10,000 
simulation runs from the htted gamma distributions used to overlay the densities in Figure 
1. The ratios for the htted gamma are denoted p. 





90/10 

80/20 

80/50 

20/50 

DF 


P 

3.888 

2.502 

1.515 

0.605 


Stud 

Cl 

[3.81,3.97] 

[2.46,2.55] 

[1.50, 1.53] 

[0.596, 0.614] 


VST 

Cl 

[3.81,3.97] 

[2.46,2.55] 

[1.50, 1.53] 

[0.596, 0.614] 

Fitted 


P 

3.872 

2.419 

1.507 

0.623 

Gamma 

VST 

cp 

0.954 

0.952 

0.952 

0.952 



w 

0.201 

0.092 

0.039 

0.020 


Stud 

cp 

0.954 

0.952 

0.952 

0.952 



w 

0.201 

0.093 

0.039 

0.020 

2011 










90/10 

80/20 

80/50 

20/50 

DF 


p 

3.766 

2.530 

1.535 

0.606 


Stud 

GI 

[3.70,3.83] 

[2.49,2.57] 

[1.52, 1.55] 

[0.599, 0.614] 


VST 

GI 

[3.70,3.83] 

[2.49,2.57] 

[1.52, 1.55] 

[0.599, 0.614] 

Fitted 


P 

3.678 

2.34 

1.485 

0.635 

Gamma 

VST 

cp 

0.954 

0.956 

0.949 

0.956 



w 

0.152 

0.072 

0.031 

0.016 


Stud 

cp 

0.954 

0.956 

0.949 

0.955 



w 

0.152 

0.072 

0.031 

0.016 
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Table 3: Values of Pp^q for the three distributions LN(0,1), xi and Pareto(2) for which the 
coverage probabilities and intervals widths are reported in Table 4. 



5/95 

10/90 

20/80 

80/20 

90/10 

95/5 

LN 

0.04 

0.08 

0.19 

5.38 

12.98 

26.84 


0.04 

0.09 

0.22 

4.62 

10.70 

22.21 

PAR 

0.01 

0.03 

0.10 

10.47 

39.97 

133.66 
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Table 4: Coverage probabilities (cp) and mean [w) interval width for the VST and stndentized 
intervals (Stnd) based on the lognormal-QOR bandwidth. 


n 

F 



5/95 

10/90 

20/80 

80/20 

90/10 

95/5 

too 

LN 

VST 

cp 

0.965 

0.966 

0.970 

0.964 

0.965 

0.966 




w 

0.057 

0.083 

0.147 

4.318 

14.354 

41.977 



Stud 

cp 

0.966 

0.968 

0.972 

0.966 

0.967 

0.969 




w 

0.059 

0.085 

0.148 

4.365 

14.743 

45.015 


xi 

VST 

cp 

0.947 

0.956 

0.963 

0.964 

0.959 

0.952 




w 

0.062 

0.095 

0.155 

3.477 

12.013 

39.261 



Stud 

cp 

0.950 

0.961 

0.965 

0.965 

0.960 

0.956 




w 

0.067 

0.098 

0.157 

3.483 

12.084 

39.704 


PAR 

VST 

cp 

0.958 

0.956 

0.966 

0.963 

0.960 

0.957 




w 

0.067 

0.045 

0.112 

13.222 

83.943 

611.063 



Stud 

cp 

0.966 

0.963 

0.967 

0.970 

0.966 

0.965 




w 

0.074 

0.048 

0.115 

13.417 

88.059 

2585.756 

250 

LN 

VST 

cp 

0.966 

0.968 

0.969 

0.964 

0.971 

0.964 




w 

0.032 

0.049 

0.088 

2.582 

8.359 

23.123 



Stud 

cp 

0.970 

0.966 

0.969 

0.964 

0.971 

0.966 




w 

0.032 

0.050 

0.089 

2.591 

8.428 

23.520 


xi 

VST 

cp 

0.959 

0.961 

0.962 

0.962 

0.960 

0.958 




w 

0.039 

0.059 

0.097 

2.104 

7.033 

21.030 



Stud 

cp 

0.964 

0.962 

0.964 

0.961 

0.962 

0.960 




w 

0.040 

0.060 

0.097 

2.105 

7.047 

21.112 


PAR 

VST 

cp 

0.960 

0.957 

0.957 

0.963 

0.959 

0.959 




w 

0.011 

0.025 

0.066 

7.511 

43.094 

225.703 



Stud 

cp 

0.962 

0.959 

0.960 

0.965 

0.962 

0.964 




w 

0.012 

0.026 

0.067 

7.545 

43.664 

233.251 

500 

LN 

VST 

cp 

0.969 

0.963 

0.961 

0.963 

0.964 

0.970 




w 

0.022 

0.034 

0.061 

1.771 

5.739 

15.761 



Stud 

cp 

0.970 

0.964 

0.963 

0.963 

0.964 

0.970 




w 

0.022 

0.034 

0.061 

1.774 

5.760 

15.875 


xi 

VST 

cp 

0.960 

0.963 

0.958 

0.961 

0.962 

0.963 




w 

0.028 

0.041 

0.068 

1.454 

4.848 

14.231 



Stud 

cp 

0.960 

0.964 

0.960 

0.961 

0.961 

0.963 




w 

0.028 

0.042 

0.068 

1.455 

4.853 

14.256 


PAR 

VST 

cp 

0.960 

0.957 

0.962 

0.958 

0.957 

0.959 




w 

0.007 

0.017 

0.046 

5.081 

28.285 

140.555 



Stud 

cp 

0.961 

0.959 

0.961 

0.960 

0.958 

0.959 




w 

0.008 

0.017 

0.046 

5.090 

28.446 

142.368 
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Table 5: Large sample empirical probabilities (cp) and average interval width [w) for the 
VST and studentized intervals (Stud) using the lognormal QOR. 


n 

F 



5/95 

10/90 

20/80 

80/20 

90/10 

95/5 

1000 

LN 

VST 

cp 

0.966 

0.964 

0.959 

0.959 

0.963 

0.968 




w 

0.015 

0.023 

0.042 

1.223 

3.928 

10.727 



Stud 

cp 

0.966 

0.963 

0.959 

0.959 

0.963 

0.967 




w 

0.015 

0.023 

0.042 

1.224 

3.935 

10.761 


X3 

VST 

cp 

0.955 

0.958 

0.958 

0.956 

0.955 

0.962 




w 

0.019 

0.029 

0.047 

1.010 

3.342 

9.775 



Stud 

cp 

0.957 

0.958 

0.957 

0.956 

0.955 

0.962 




w 

0.020 

0.029 

0.047 

1.010 

3.344 

9.784 


PAR 

VST 

cp 

0.954 

0.955 

0.956 

0.956 

0.956 

0.958 




w 

0.005 

0.012 

0.032 

3.522 

19.396 

93.367 



Stud 

cp 

0.954 

0.957 

0.956 

0.958 

0.955 

0.959 




w 

0.005 

0.012 

0.032 

3.525 

19.446 

93.893 

5000 

LN 

VST 

cp 

0.958 

0.956 

0.953 

0.952 

0.955 

0.957 




w 

0.006 

0.010 

0.018 

0.534 

1.690 

4.512 



Stud 

cp 

0.958 

0.955 

0.953 

0.953 

0.955 

0.956 




w 

0.006 

0.010 

0.018 

0.534 

1.691 

4.515 


xi 

VST 

cp 

0.957 

0.954 

0.948 

0.950 

0.957 

0.957 




w 

0.008 

0.013 

0.021 

0.443 

1.448 

4.175 



Stud 

cp 

0.956 

0.954 

0.949 

0.951 

0.957 

0.957 




w 

0.008 

0.013 

0.021 

0.443 

1.448 

4.175 


PAR 

VST 

cp 

0.949 

0.954 

0.956 

0.956 

0.954 

0.956 




w 

0.002 

0.005 

0.014 

1.534 

8.372 

39.466 



Stud 

cp 

0.949 

0.954 

0.956 

0.955 

0.955 

0.956 




w 

0.002 

0.005 

0.014 

1.535 

8.376 

39.503 

10000 

LN 

VST 

cp 

0.956 

0.952 

0.954 

0.953 

0.953 

0.958 




w 

0.004 

0.007 

0.013 

0.375 

1.185 

3.153 



Stud 

cp 

0.956 

0.953 

0.954 

0.953 

0.953 

0.957 




w 

0.004 

0.007 

0.013 

0.375 

1.185 

3.154 


X3 

VST 

cp 

0.951 

0.954 

0.950 

0.952 

0.949 

0.952 




w 

0.006 

0.009 

0.015 

0.312 

1.018 

2.925 



Stud 

cp 

0.952 

0.954 

0.949 

0.952 

0.948 

0.952 




w 

0.006 

0.009 

0.015 

0.312 

1.018 

2.925 


PAR 

VST 

cp 

0.951 

0.951 

0.951 

0.951 

0.957 

0.955 




w 

0.002 

0.004 

0.010 

1.079 

5.890 

27.727 



Stud 

cp 

0.950 

0.951 

0.951 

0.951 

0.957 

0.954 




w 

0.002 

0.004 

0.010 

1.079 

5.891 

27.740 
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Table 6: Coverage probabilities for the VST intervals using the lognormal QOR with pro¬ 
portion of zeroes in the mixture distribution set to 0.01, 0.02 and 0.05. 


e 

n 


5/95 

10/90 

20/80 

80/20 

90/10 

95/5 

0.01 

1000 

LN 

0.977 

0.959 

0.962 

0.959 

0.956 

0.974 



xi 

0.960 

0.960 

0.964 

0.965 

0.973 

0.965 



PAR 

0.954 

0.954 

0.956 

0.966 

0.957 

0.957 


5000 

LN 

0.967 

0.958 

0.950 

0.962 

0.961 

0.968 



xi 

0.967 

0.944 

0.958 

0.952 

0.951 

0.951 



PAR 

0.949 

0.957 

0.956 

0.960 

0.948 

0.961 


10000 

LN 

0.959 

0.955 

0.962 

0.966 

0.949 

0.957 



xi 

0.958 

0.961 

0.958 

0.945 

0.958 

0.963 



PAR 

0.947 

0.957 

0.946 

0.951 

0.953 

0.950 

0.02 

1000 

LN 

0.974 

0.976 

0.960 

0.967 

0.968 

0.979 



xi 

0.950 

0.968 

0.959 

0.966 

0.961 

0.956 



PAR 

0.939 

0.950 

0.954 

0.963 

0.962 

0.934 


5000 

LN 

0.979 

0.958 

0.955 

0.958 

0.958 

0.974 



xi 

0.970 

0.965 

0.950 

0.953 

0.958 

0.958 



PAR 

0.948 

0.946 

0.966 

0.966 

0.957 

0.940 


10000 

LN 

0.976 

0.958 

0.953 

0.945 

0.957 

0.974 



xi 

0.952 

0.947 

0.954 

0.950 

0.952 

0.965 



PAR 

0.963 

0.952 

0.956 

0.950 

0.950 

0.938 

0.05 

1000 

LN 


0.973 

0.971 

0.958 

0.972 




xi 


0.940 

0.959 

0.961 

0.939 




PAR 


0.923 

0.955 

0.961 

0.932 



5000 

LN 


0.982 

0.954 

0.953 

0.973 




xi 


0.967 

0.960 

0.945 

0.972 




PAR 


0.947 

0.936 

0.955 

0.930 



10000 

LN 


0.978 

0.956 

0.955 

0.981 




xi 


0.963 

0.951 

0.954 

0.966 




PAR 


0.945 

0.964 

0.955 

0.959 
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Table 7: Australian equivalized weekly income (EWI) data for financial years beginning July 
1, 2005 and July 1, 2011, in terms of 2011-2012 dollar values, adjusted for the consumer price 
index. ABS (2011), Subset of Table on p. 27, Document 6523.0, 2011-2012; downloaded 
29/03/2015. 


Number of persons (’000) 


EWI 

2005-2006 

2011-2012 

No income 

73.7 

87.4 

$l-$49 

90.1 

83.7 

$50-$99 

66.7 

101.8 

$100-$149 

76.3 

88.2 

$150-$199 

121.9 

121.5 

$200-$249 

259.0 

225.9 

$250-$299 

710.3 

382.3 

$300-$349 

1244.6 

475.3 

$350-$399 

1235.7 

1221.4 

$400-$449 

1139.8 

1097.8 

$450-$499 

1070.7 

1133.0 

$500-$599 

2189.4 

2026.0 

$600-$699 

2259.2 

2040.7 

$700-$799 

1922.5 

2191.2 

$800-$899 

1647.9 

1983.0 

$900-$999 

1350.6 

1467.7 

$1000-$1099 

1048.9 

1522.2 

$1100-$1399 

1847.3 

2816.8 

$1400-$1699 

735.2 

1484.1 

$1700-$1999 

334.7 

713.4 

$2000 or more 

506.2 

925.5 


19930.7 

22189.0 
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Table 8: Coverage probabilities (cp) and mean (w) interval widths for the VST and Studen- 
tized intervals based on GLD QOR bandwidths. 


n 

F 



5/95 

10/90 

20/80 

80/20 

90/10 

95/5 

too 

LN 

VST 

cp 

0.958 

0.973 

0.970 

0.960 

0.972 

0.970 




w 

0.057 

0.081 

0.148 

4.312 

14.220 

40.999 



Stud 

cp 

0.961 

0.972 

0.973 

0.965 

0.974 

0.972 




w 

0.058 

0.083 

0.149 

4.358 

14.607 

43.458 


xl 

VST 

cp 

0.954 

0.959 

0.954 

0.960 

0.965 

0.958 




w 

0.063 

0.096 

0.157 

3.533 

12.161 

39.830 



Stud 

cp 

0.963 

0.967 

0.969 

0.956 

0.969 

0.954 




w 

0.068 

0.099 

0.159 

3.541 

12.241 

40.366 


PAR 

VST 

cp 

0.944 

0.960 

0.970 

0.957 

0.967 

0.961 




w 

0.022 

0.043 

0.112 

13.263 

80.343 

1048.191 



Stud 

cp 

0.952 

0.966 

0.967 

0.963 

0.970 

0.963 




w 

0.025 

0.045 

0.114 

13.449 

83.968 

1124.213 

250 

LN 

VST 

cp 

0.968 

0.966 

0.967 

0.967 

0.965 

0.969 




w 

0.032 

0.049 

0.089 

2.607 

8.358 

22.975 



Stud 

cp 

0.965 

0.964 

0.965 

0.967 

0.966 

0.970 




w 

0.032 

0.050 

0.089 

2.616 

8.427 

23.358 


xl 

VST 

cp 

0.955 

0.959 

0.953 

0.956 

0.962 

0.962 




w 

0.040 

0.059 

0.097 

2.118 

7.108 

21.586 



Stud 

cp 

0.960 

0.957 

0.952 

0.955 

0.962 

0.967 




w 

0.041 

0.060 

0.098 

2.120 

7.123 

21.678 


PAR 

VST 

cp 

0.951 

0.954 

0.951 

0.960 

0.959 

0.959 




w 

0.011 

0.025 

0.066 

7.511 

42.221 

223.830 



Stud 

cp 

0.957 

0.953 

0.950 

0.968 

0.961 

0.968 




w 

0.011 

0.026 

0.066 

7.543 

42.732 

230.261 

500 

LN 

VST 

cp 

0.969 

0.957 

0.972 

0.969 

0.956 

0.948 




w 

0.021 

0.034 

0.062 

1.781 

5.724 

15.708 



Stud 

cp 

0.972 

0.957 

0.973 

0.971 

0.954 

0.952 




w 

0.021 

0.034 

0.062 

1.784 

5.744 

15.818 


xl 

VST 

cp 

0.961 

0.950 

0.958 

0.954 

0.969 

0.957 




w 

0.028 

0.042 

0.068 

1.467 

4.775 

14.209 



Stud 

cp 

0.958 

0.950 

0.957 

0.957 

0.970 

0.957 




w 

0.028 

0.042 

0.069 

1.467 

4.780 

14.235 


PAR 

VST 

cp 

0.957 

0.960 

0.954 

0.959 

0.963 

0.967 




w 

0.007 

0.017 

0.046 

5.054 

28.252 

142.167 



Stud 

cp 

0.961 

0.959 

0.961 

0.964 

0.956 

0.963 




w 

0.008 

0.017 

0.046 

5.064 

28.407 

143.924 
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Table 9: Coverage probabilities for the interval estimators in (14) and, equivalently, (15) 
comparing percentile ratios from the LN(0,1) and LN(0.2,1.5) distributions. The sample 
sizes are n and m respectively. 


(n, m) 

5/95 

10/90 

20/80 

80/20 

90/10 

95/5 

(200,100) 

0.976 

0.970 

0.970 

0.973 

0.972 

0.973 

(500,1000) 

0.969 

0.965 

0.963 

0.963 

0.964 

0.966 

(10000,5000) 

0.959 

0.957 

0.953 

0.952 

0.956 

0.960 
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Figure 1: Histograms of the data summarized in Table 7, after exclusion of the hrst and 
last categories. Below each of them are density plots in solid lines of the reconstructed data 
sets described in the text. Superimposed in dashed lines are htted gamma densities with 
respective shape, scale parameters (0,6)2005 = (3.94,184.88) and (0,6)2011 = (4.23,197.61). 


Figure 2: Simulated coverage probability for the LOGN(0,1) distribution using the VST and 
studentized intervals for all combinations of p and q from 0.05, 0.06,..., 0.95. 1000 iterations 
were used for each combination. 


Figure 3: Simulated coverage probability for the Pareto(2) distribution using the VST and 
studentized intervals for all combinations of p and q from 0.05, 0.06,..., 0.95. 1000 iterations 
were used for each combination. 


Figure 4: Plots of IF( 2 ;; pp^g, F)/pp^g{F) for which 2 ; G [0,1] andp G (0.05,0.95) and q = 1—p 
(Plot A) and with z = 0, p e (0.05,0.95) and q G (0.05,0.95) (Plot B). 


Figure 5: Plots of ASV(pp 4 _p; F)/pp ;^_p(F) for p G (0.05,0.95) (Plot A) and with p G 
(0.05,0.95) and q = 1—p (Plot B). 
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VST: n = 100 


Studentized: n = 100 



Figure 2 
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VST: n = 100 


Studentized: n = 100 



Figure 3 
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